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Abstract 

Differently from theoretical scale-free networks, most of real networks present multi-scale behavior with nodes 
structured in different types of functional groups and communities. While the majority of approaches for clas- 
sification of nodes in a complex network has relied on local measurements of the topology/ connectivity around 
each node, valuable information about node functionality can be obtained by Concentric (or Hierarchical) Mea- 
surements. In this paper we explore the possibility of using a set of Concentric Measurements and agglomerative 
clustering methods in order to obtain a set of functional groups of nodes. Concentric clustering coefficient and 
convergence ratio are chosen as segregation parameters for the analysis of a institutional collaboration network 
including various known communities (departments of the University of Sao Paulo). A dendogram is obtained 
and the results are analyzed and discussed. Among the interesting obtained findings, we emphasize the scale-free 
nature of the obtained network, as well as the identification of different patterns of authorship emerging from 
different areas (e.g. human and exact sciences). Another interesting result concerns the relatively uniform distri- 
bution of hubs along the concentric levels, contrariwise to the non-uniform pattern found in theoretical scale free 
networks such as the BA model. 



1 Introduction 

One of the inherent features of complex networks 
concerns their structured patterns of connectivity, 
which depart from the largely uniform degree dis- 
tribution found in random graphs UH2H3H4j.lt ^ s 
such a complex connectivity, found in some real 
and theoretical networks, that gives rise to inter- 
esting structural elements like communities and 
scale-free node degree distributions 0. Though 
such patterns can be sometimes identified by con- 
sidering only simple features such as the node de- 
grees, more information can be obtained by con- 
sidering additional measurements [6J. Indeed, 
some types of communities can be overlooked 
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while considering only such measurements. Even 
more information about the heterogeneity of net- 
works connectivity can be provided by the con- 
sideration of concentric (or hierarchical) measure- 
ments, obtained by taking into account successive 
neighborhoods around each node (7HBH9). This 
possibility has been preliminary explored. In IfTOll , 
those measurements were used in order to ob- 
tain interesting information about the topological 
features of the networks as a whole. The results 
showed distinct behaviors for real and grown net- 
works, with the latter often exhibiting a mixture 
of features typical to different models. That pa- 
per also illustrated the possibility of clustering of 
groups of nodes with similar concentric connectiv- 
ity. 

The current work extends in a more system- 
atic and formal way such preliminary investiga- 
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tions. More specifically, we adopt a sound way to 
measure the similarity between the distributions 
of concentric measurements, namely by calculat- 
ing the Spearman correlation between those fea- 
tures. Compared to the previously adopted con- 
sideration of the Euclidean distances, such an ap- 
proach accounts for less sensitivity to the absolute 
values of the measurements. The potential of this 
approach is illustrated with respect to the impor- 
tant problem of scientific collaboration, as quan- 
tified by co-authorship, between the staff of the 
largest Brazilian university, namely the University 
of Sao Paulo - USP. A dataset of scientific publica- 
tion covering from 2003 and 200^3 was considered 
in order to build a collaborative network where 
each node corresponds to a member of staff, while 
the links are provided by co-authorships in pub- 
lications indexed by forty libraries integrated with 
SIBi-USP. Interestingly, the original dataset also in- 
cluded the respective affiliations of each author, so 
that a preliminary identification of possible com- 
munities (departments of USP) was available for 
use as a reference. 

A series of concentric measurements were cal- 
culated from this network and had their average 
and standard deviation values compared to the- 
oretical models (Erdos-Renyi - ER, and Barabasi- 
Albert - BA). Subsequently, the new methodology 
for node classification was applied in order to or- 
ganize the network nodes into clusters, possibly 
corresponding to the communities existing in the 
network. An average-based concentric clustering 
algorithm fTTl [12| was used for obtaining such a 
clusterization. A series of interesting results was 
obtained. First, as could be expected, we found 
that the collaborative network exhibits a scale-free 
like distribution of node degrees. Among the sev- 
eral considered concentric measurements, the con- 
vergence ratio and concentric clustering coefficient 
were found to contribute to particularly to the 
discrimination between the nodes, and were con- 
sequently adopted for the c clustering of nodes. 
When the obtained clusters were compared with 
the original institutional departments, a more def- 
inite correspondence was found in the case of ex- 
act sciences. A less clear adherence with the origi- 
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nal departments was found for humanities and bi- 
ological sciences. Such findings suggest a more lo- 
calized pattern of co-authorship in the case of ex- 
act sciences. Another interesting finding regards 
the several deviations of specific properties be- 
tween the collaborative network and the BA the- 
oretical model. For instance, the real network was 
found to have larger values of average shortest 
paths, indicating higher network sizes when com- 
pared to the BA counterpart. In addition, the in- 
formation provided by the convergence ratio, sug- 
gests that the large size of this network is a conse- 
quence of the uniform distribution of hubs along 
concentric levels, where the hubs tend to be con- 
nected to low degree nodes, while hubs are almost 
connected one another in the BA case. 

2 Basic Concepts and Models of 
Networks. 

Consider a undirected and weighted network T 
defined by TV nodes and a set of K weighted 
edges connecting those nodes. F can be com- 
pletely specified by an adjacency matrix G with el- 
ements Gij = Gji (i.e. a symmetric matrix) where 
the strength of a connection between node i and 
node j is Cry (i.e. the value of matrix at i-th line 
and j-th column), and a null value represents no 
connection. 

Nodes can be characterized by the traditional 
immediate neighborhood features, i.e. the node 
degree and clustering coefficient. For weighted 
networks, the node degree of a node i, represented 
by ki, is defined as the sum of all weight values 
of edges that connects i to any other nodes. More 
specifically, considering the adjacency matrix rep- 
resentation dj, the node degree can be calculated 
by: 

n 

The clustering coefficient of a node i, abbrevi- 
ated as cci, is defined as the number of connec- 
tions, ei(i), among the nodes in the immediate 
neighbors of i divided by he maximum possible 
number of connections of those nodes. Let n\ (i) be 
the number of nodes at the immediate neighbor of 
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i. The clustering coefficient can then be calculated 
as: 

co d (i) = 2 ei(i) (2) 
ni{i)(ni(i) - 1) 

This paper considers two theoretical network 
models for comparison purposes, namely: Erdos- 
Renyi (random networks) and Barabasi-Albert(a 
scale-free model). The Erdos-Renyi (ER) model 
is defined by a network created connecting the 
pairs of nodes with a constant probability OJEJEI, 
resulting in a binomial distribution of node de- 
grees. A Barabasi- Albert (BA) network is created 
by starting which a non-connected network with 
too nodes and then adding new nodes progres- 
sively with k new connections between each new 
node and those already in the network. The proba- 
bility of the new connections is proportional to the 
node degree of the existing nodes. This procedure 
results in a scale-free network, where the node de- 
gree distribution follows a power law and shows 
the presence of hubs. 

3 Concentric Measurements 

A ring Rd(i) representing the nodes that are at the 
concentric level d centered at node i, is defined as 
the sub-graph containing all nodes whose shortest 
path value, starting at node i, is d. A network with 
three concentric levels is illustrated in Figure [TJ 
where the rings i?d(l) (i-e. centered at node 1) of 
levels d = 1, d = 2 and d = 3 are represented by 
concentric circles, i.e. with R (l)={l}, i?i(l)={2, 
3, 4}, i? 2 (l)={5, 6, 8} and i? 3 (l)={9, 10, 11, 12, 13, 
14}. 

The concept of concentric levels allows the com- 
plementation of the traditional measurements, 
focused not on the local topological properties 
of nodes, but taking into account its successive 
neighborhoods. In general, the concentric mea- 
surements are calculated considering relationships 
between the nodes and edges at two or more con- 
centric levels. The following measurements can 
be naturally generalized for weighted networks by 
performing some modifications. 

The concentric node degree, kd(i), of a reference 
node i at the concentric level d is defined as cor- 
responding to the number of edges connecting 
the nodes in Rd(i) and Rd+i(i). As an example, 
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Figure 1 : A small network with 3 concentric levels 
considering node i as reference 



we have for Figure Q] fco ( 1 ) = 3, fci(l) = 5 and 
£2(1) = 7. Note that the concentric node degree is 
not an average value taken among the number of 
nodes in Rd(i)- This measurement is the direct ex- 
tension of the well known node degree where the 
reference node is understood as the nodes inside 
the ball B d (i) (i.e. the ball containing the nodes in 
rings to d). The concentric node degree can be ex- 
tended for weighted networks by taking the sum 
of the weight values for every connection between 
these nodes and the nodes at the next level. 

The concentric clustering coefficient, 004(1), is the 
immediate generalization of the traditional clus- 
tering coefficient and considers the only the nodes 
and connections of the ring Rd(i)- It is defined as 
in Equation^ where the number of edges in Rd(i) 
is expressed as ed(i), and the number of elements 
is represented as 114(1) ■ 

co d (i) = 2 ^ (3) 
nd\i)\n d (i) - 1) 

For node i = 1 in the network shown in Figured! 
we have that cci(l) = l/3,ec2(l) = 0,andcc3(l) = 
1/5. 
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Other interesting concentric measurements 
which can be obtained with respect to the refer- 
ence node i and used in order to complete the 
characterization of complex networks include the 
following: 

Convergence ratio (C d (i)): Corresponds to the 
ratio between the concentric node degree of node 
i at distance d and the number of nodes in the ring 
at next ring, i.e. 



This measurement quantifies the average num- 
ber of edges received by each node in the ring d+1. 
We have necessarily that Cq{i) = 1 for whatever 
node selected as the reference i. In the case illus- 
trated in FigureHJ we have C (l) = 1, Ci(l) = 5/4 
andC 2 (l) = 1. 

Intra-ring degree (A d (i)): This measurement is 
obtained by taking the average among the degrees 
of the nodes in the subnetwork 7<j(i). Observe that 
only those edges between the nodes in such a sub- 
network are considered, therefore overlooking the 
connections established by such nodes within the 
nodes in the rings at levels d — 1 and d+1. For 
instance, we have for the situation in Figure [l] that 
A^l) = 1/3, A 2 (l) = and A 3 (l) = 1/2. For 
weighted networks the value of intra-ring is the 
average of the weights of all nodes at the rings 
R{d- 1} andi?{d+ 1}. 

Inter-ring degree (Ed{i)): This measurement 
corresponds to the average of the number of con- 
nections between each node in the ring Rd(i) and 
those in R d+1 (i). For instance, for Figure [T] we 
have E (l) = 3, E^l) = 5/3 and E 2 (l) = 3/2. 
Observe that E d (i) = k d (i)/n d (i). 

concentric common degree (H d {i)): Equal to the 
average node degree among the nodes in R d (i), 
considering all edges in the original network. For 
Figure Q] we have H a (l) = 1, Hi(l) = 10/3 and 
i?2(l) = 16/7. The concentric common degree ex- 
presses the average node degree at each concentric 
level, indicating how the network node degrees 
are distributed along the network hierarchies. 

Table Q] summarizes the concentric measure- 
ments to be used in this paper, all of which are 
defined with respect to one of the network nodes, 
identified by i, taken as a reference and at a dis- 
tance d from that node. 
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inter-ring node degree 




of node i at distance d 
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concentric common degree 




i of node at distance d 


cc d (i) 


cone, clustering coefficient 




of node i at distance d 


cs) 


convergence rate at 




concentric level d 



Table 1: The concentric measurements considered 
in the current article. 



4 Statistical Concepts 

Two statistical methodologies are used in the 
present paper in order to obtain groups of nodes 
with similar concentric measurements. The first 
step is to choose a distance measurement fill 
12J(i.e. how similar two nodes are, in terms of a set 
of measurements). Because of the varying forms of 
the concentric measurement distributions, a non- 
parametric distance such as the Spearman rank 
correlation coefficient, should be adopted. 

4.1 Spearman Rank Correlation. 

The Spearman rank correlation coefficient is a sta- 
tistical measurement quantifying how strong is 
the tendency of two random variables to vary to- 
gether. Unlike the Pearson correlation coefficient, 
this measurement is not restricted to linear joint 
variations and can be used to quantify the similar- 
ity between the form of two curves (or data sets). 
In fact, the Spearman rank correlation is a special 
case of the Pearson correlation, where every value 
of the curve is ranked before calculating the coef- 
ficient. 

Given two normalized distributions of two ran- 
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dom discrete variables, X and Y, the Pearson cor- 
relation coefficient between them is defined by the 
covariance of those two variables divided by their 
respective standard deviations, i.e.: 

cov(X,Y) 
rxY = (5) 

CJX&Y 

Given n samples of the random variables X and 
Y, henceforth expressed Xi and yi, the respective 
Pearson Correlation Coefficient can be estimated 
as: 

rxY = — -, -r (6) 

(n - \)a x u Y 

The Spearman rank correlation coefficient p can 
be obtained by replacing X and Y values by their 
ranked version X* and Y*. For example, consid- 
ering a data set with X = {1.1,3,0.5,100} and 
Y = {2,0.8,1,0.1}, the ranked data set will be 
X* = {2,3,1,4} Y* = {4,2,3, 1}, and the Spear- 
man rank coefficient will be p = rx*Y* 

5 Methodology 

In order to illustrate the use of concentric mea- 
surements and node classification, we consider 
a collaboration network obtained from real data, 
which will be compared to Barabasi-Albert(BA) 
and Erdos and Renyi(ER) counterparts. 

The collaboration network, presented in this 
work for the first time, was created by collect- 
ing data about co-authorship in published arti- 
cles, where each node represents an author and 
each undirected edge represents a paper written 
by the respective two nodes (authors). Because of 
the possible existence of more than one paper by 
the same two authors, those edges are weighted 
with values representing the number of papers 
that those authors wrote together. This network 
was obtained from the library database of "Uni- 
versidade de Sao Paulo". In addition to the collab- 
orative information, every node was labeled with 
the author corresponding department. The net- 
work resulted with 5630 nodes and an average 
topological node degree of (k t ) ~ 15 and average 
strength of (k) w 40 . 

The simulated networks, of type BA and ER, 
were obtained by the classical methods [3J. Ran- 
dom networks(ER) were generated by selecting 



edges with uniform probability p, while the BA 
networks were grown by starting with mO ran- 
domly interconnected nodes and adding new 
nodes with m edges which are attached to the ex- 
isting nodes with probability proportional to their 
respective node degrees. The networks were cre- 
ated with 5000 nodes and average node degree of 
(k) w 16 for BA and (k) w 15 for ER. 

We started the analysis of the characteristics of 
the real and theoretical networks by considering 
the traditional node degree. Next, a collection of 
concentric measurements were obtained for all the 
networks, considering every node of as the cen- 
ter (reference), and then taking the average val- 
ues and average ± standard deviations. The con- 
sidered measurements were the concentric node 
degree, concentric clustering coefficient, intra-ring 
degree, inter-ring degree, common node degree 
and convergence ratio. The distributions of such 
measurements obtained for the three networks 
were compared as discussed in the next section. 

While distributions of the concentric measure- 
ments supply subsidies for a global characteriza- 
tion of the networks, they do not convey informa- 
tion about the individual node concentric charac- 
teristics. This information can be obtained in terms 
of the individual node concentric measurements 
among the several levels centered at this node. In 
order to classify those nodes into groups with sim- 
ilar concentric features, the agglomerative hierar- 
chical clustering algorithm, using spearman rank 
coefficient as distance measure, was applied over 
the individual node concentric clustering coeffi- 
cients and convergence ratios. This data was ob- 
tained only for the collaboration network. The 
resulting tree (called dendogram) was truncated 
so as to yield eight groups of nodes with similar 
properties. 

Because the nodes in collaborative network are 
labeled with the respective department of the cor- 
responding author, the effectiveness in the segre- 
gation of those groups can be quantified in the 
sense of the percentage of nodes common to de- 
partments and communities. 

6 Results and Discussions 

This section begins by presenting the results ob- 
tained for the concentric level measurements de- 
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scribed in the methodology section and then dis- 
cusses such results while comparing the collab- 
orative network with the other two models. Fi- 
nally, the agglomerative concentric clustering re- 
sults are shown as dendograms and average con- 
centric measurements distributions obtained for 
each group. The effectiveness of the segregation 
of labeled groups is presented in the form of pie 
charts. 

By obtaining the traditional degree distribution 
of the considered networks, as can be seen in the 
loglog curves of Figure |2J the collaboration net- 
work (a) can be understood as a scale-free net- 
work, like the BA model (b), because of the well- 
known power law behavior of those curves. It is 
interesting to note that the power coefficients of 
the two scale free networks are distinct. 

Figures |3] to present the concentric measure- 
ments distributions obtained for the three net- 
works while considering all the nodes. The as- 
terisks indicate the position of the average short- 
est path between any pair of nodes, which are in- 
cluded in order to provide a reference for the hier- 
archical analysis. 

Figure |3] shows the concentric number of nodes 
(average ± standard deviation) obtained for the 
considered networks. All curves are characterized 
by a peak. Interestingly, the collaboration network 
presents a considerably smoother curve and wide 
peak when compared with the simulated mod- 
els. In addition, its high values of standard devia- 
tion suggest a wide variation of concentric features 
among the nodes. The values of concentric node 
degrees, shown in Figure SJ are similar to the re- 
spective measurements of the concentric number 
of nodes. 

The inter-ring degree curves, shown in Figured 
are monotonically decreasing after the first ring. 
While such curves for the BA and ER model are 
clearly distinct, the curve for the collaborative net- 
work shows a mix of both behavior. The collabo- 
rative network curve begins with a constant value, 
like for the ER model, and then decreases in a 
smooth fashion, like the curve for the BA model. 
The curves obtained for the BA case show a peak 
at the first ring, which is a consequence of the high 
chance of finding a hub at that level. However, 
that characteristic seems not to be present on the 
collaborative network. An explanation for this ef- 
fect is that the average topological distribution of 
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Figure 3: Hierarchical number of nodes (average 
± standard deviation) for all considered networks, 
which are identified above each graph. Observe 
that most curves are characterized by a peak. The 
average value of the shortest path between any 
two nodes is marked by an asterisk. 
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Figure 4: Hierarchical node degrees obtained for 
all the considered network models. The curves are 
similar to those obtained for the concentric num- 
ber of nodes, except for a expected offset at one 
level. 
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Figure 2: Average degree distributions for the three considered networks: (a)Collaborative network, 
(b) random ER model and (c) Barabasi- Albert BA model. ER and BA networks with average degree 

(fc)=4. 
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hubs in the collaborative network and BA model 
are very different. 

The results for intra-ring degree, shown in Fig- 
ure |H are very similar to the concentric number of 
nodes measurement, characterized by a peak, ex- 
cept for the collaborative network, which presents 
a wider peak centered at the left hand side of the 
graph. 

Figure[7]shows the values of concentric common 
degree for the considered networks. These dis- 
tributions are characterized by a decreasing curve 
starting at the first level. Generally, these curves 
are similar to those obtained for the inter-ring de- 
grees, except that the present curves are wider and 
the collaborative network has a well defined peak 
at first level. Another observation is that the aver- 
age concentric common degree tends to be higher 
at the initial concentric levels, which is a conse- 
quence of the fact that the largest hubs present 
in the BA model tend to be reached sooner, pro- 
viding bypasses to the other nodes and therefore 
left-shifting the the peak and reducing the number 
of concentric levels. This is the main reason why 
the peak in the BA networks tends to be displaced 
to the lefthand side than in the random network. 
As with the inter-ring degree, the distribution of 
concentric degree for the collaborative network re- 
sults in a mix of the characteristics of the curves 
for both models, supporting the evidence that the 
topological location of the hubs tends to be more 



widespread than those in the BA model. 

As shown in Figure |SJ the concentric clustering 
coefficients curves are very distinct among both 
simulated models. The curves for the ER model 
present a fast increase in value, followed by a 
plateaux and then a rapid decrease. In fact, the 
nodes at each ring of those networks are character- 
ized by low interconnectivity. The concentric clus- 
tering coefficient curves obtained for the BA and 
the collaborative network, present much higher 
values and involve a sharper peak. In addition, 
the curve for the co-authorship network tends to 
present another peak along the last levels. 

The convergence ratios obtained for each of the 
considered network models, shown in Figure |H 
yielded the most distinct curves among the sim- 
ulated models and collaborative network. The 
curves for the BA and ER models are character- 
ized by similar behavior among themselves and a 
peak at the last levels (except for the regular mod- 
els), along which the concentric expansion tends 
to saturate, i.e. after the peak is reached. Note 
also that sharper peaks tend to be obtained for 
high values of k. The collaborative curve presents 
a wider peak, with the center displaced to the 
lefthand side, far away from the average shortest 
path. This is a consequence of the fact that, dif- 
ferently of what is obtained for the BA, the hubs 
are reached gradually along the concentric levels 
while starting from most nodes. 
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Hier. Convergence Ratio (BA <k>=1 5) 
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Figure 7: Hierarchical common degree measure- 
ments with the respective ± standard deviations 
obtained for the considered models. 
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Figure 8: Hierarchical clustering coefficient degree 
measurements. Note the higher values of standard 
deviation when compared to those in the other 
measurements. 
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Figure 9: Convergence Ratio measurements for the 
considered networks. 



Indeed, as verified experimentally in [10[, the 
position and width of the peak of the convergence 
ratio is ultimately defined by the distribution of 
hubs along the hierarchies. 

The fact that the convergence ratios obtained 
for the co-authoship networks, shown in Figure |9| 
tended to be relatively uniform along the hierar- 
chies indicates that the hubs are not highly inter- 
connected. In other words, the hubs tend to cover 
different portions of the network. If we under- 
stand that the hubs are more likely to correspond 
to leader scientists, it can be inferred from the con- 
vergence ratios results that the multidisciplinarity 
would be ultimately implemented by the respec- 
tive co-authors connected to each hub. 

Among all the considered measurements, the 
concentric common degrees and concentric clus- 
tering coefficients were found to provide the most 
distinct curves for each network, revealing more 
information about the distribution of hubs and the 
interconnectivity along the concentric levels. Be- 
cause the collaborative network curves have the 
highest values of standard deviation, their indi- 
vidual nodes may have distinct concentric fea- 
tures, and can be grouped into clusters of similar 
features. The remainder of this section presents 
the results obtained by application of an agglom- 
erative hierarchical clustering algorithm using the 
data obtained by the concentric clustering co- 
efficient and convergence ratios. The follow- 
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ing graphs were obtained, showing the average 
± standard deviation of the concentric measure- 
ments obtained at each respective level in the den- 
drograms. Starting at the right hand side of the 
tree, the nodes are progressively merged with ba- 
sis on the similarity of their concentric clustering 
coefficients, yielding the taxonomical categoriza- 
tion of the nodes into meaningful clusters, identi- 
fied by each branching point in the tree. 

Figure [10] shows the graph for concentric clus- 
tering coefficients and the respectively obtained 
clusters. The mean degree and number of nodes 
of each cluster are given above each graphic. As 
can be seen, in the first branch point leading to the 
first clusters (i.e. B and C), about 17% of nodes 
of the network are in cluster B and, differently 
from the curve for all nodes(i.e. A), the clustering 
coefficient distribution shows only a peak, revel- 
ing that those nodes have distinct hierarchical be- 
havior when compared with those from cluster C. 
Note that cluster B leads to a great variety of types 
of curves, while the curve for cluster H has a peak 
centered at the concentric level 2, those for clus- 
ter I are centered at level 1. Cluster E shows that 
about 25 nodes have very specific distribution in- 
cluding two peaks centered at levels 1 and 2 (as in 
J) or two peaks centered at levels 2 and 3 (as in K). 
The branch corresponding to cluster C shows two 
basic structures (G and F), both including the sec- 
ond peak, but the first one is wider for the cluster 
G when compared with F. 

The results obtained for the convergence ratio 
can be seen in Figure [TT] The main distinction be- 
tween the final groups are the positions of the cen- 
ter of the peak for each curve, varying from peaks 
centered at the concentric level 4 — as in cluster 
K, to around level 8 — as in M. In the case of the 
convergence ratio, the displacement indicates how 
close the groups of nodes are to the hubs. 

Because every node in the collaborative network 
are labeled with the author department, the per- 
centage of nodes belonging to each department is 
given by each obtained group. The results can be 
seen in Figures [12] and [13] for concentric clus- 
tering coefficient and convergence ratio, respec- 
tively. Only the seven most representative de- 
partments of a cluster are shown, other depart- 
ments are merged into a single section of the pie 
chart. Note that the most representative clusters 
for department segregation are located in the sec- 



ond blanch points. 

Figure [12] shows the pie charts of the sets of 
nodes obtained by hierarchical clustering and con- 
sidering the clustering coefficient. Each pie chart 
includes a respective legend showing the most 
highly represented institutes and their relative 
percentage considering each level. For example, 
considering the branching level 1 (i.e. groups D 
to G), the sum of the percentages of each institute 
for all pie charts should add to 1. This is the case 
of the institute HI at the branching level 1, which 
presents a participation of 17.1% in pie chart D, 
3.2% in chart E and 75.3% in chart G. 

For most levels, the pie charts do not present 
marked homogeneity as far as the nature of the in- 
stitutes is concerned (i.e. human, exact and biolog- 
ical areas). The departments are namely according 
its knowledge area followed by a number, where 
H# stands for human, E# for exact and B# for bio- 
logical areas. However, we found the remarkable 
result that a large part of the most representative 
institutes in chart C were not only from the bio- 
logical area, but also located in a same city in the 
countryside of Sao Paulo State. Therefore, these 
two factors seem to have implied a distinctive pat- 
tern of collaborations. In addition, by taking into 
account the respective clustering coefficient mea- 
surements in Figure [TOl it becomes clear that these 
collaborations involve a peak of clustering coeffi- 
cient at the hierarchical level 1. This provides fur- 
ther indication that the collaborations in pie chart 
C indeed takes place at a more localized level, im- 
plied by the geographical position of those insti- 
tutes. This more localized collaboration pattern re- 
mains in the next branching, i.e. in pie chart F. The 
institutes in the sister chart, i.e. G, have a more 
widespread collaboration pattern as indicated in 
the wider hierarchical clustering coefficient signa- 
ture in Figure [10] As far as the subdivision of the 
chart B is concerned, one of the sister charts (i.e. D) 
contains a substantially higher overall percentage 
of institute than the chart E. Therefore, we will not 
consider the latter and its respective subdivisions J 
and K in the following discussion. Charts D and E 
are characterized by the absence of the secondary 
peak in the clustering coefficient signature (com- 
pared to charts F and G). The institutes in chart 
D are heterogeneous as far as the scientific area is 
concerned, but are all located in the capital (except 
for E5). 
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Figure 10: Graphs of the average ± standard deviation of the concentric clustering coefficient obtained 
for the co-authorship network. Only four levels of the dendogram obtained by the agglomerative hier- 
archical clustering are shown. 
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Figure 11: Graphs of the average ± standard deviation of the concentric convergence ratio obtained for 
the co-authorship network. Only four levels of the dendogram obtained by the agglomerative hierar- 
chical clustering are shown. 

12 




H 





■ E0 - 100.0% 

■ H0 - 100. D% 
| | HI - 100.0% 

■ E1 " 100-0% 
| | H3 - 100.0% 

■ B1 - 100. 0% 
| | <10O. 0% 



K 




M 



N 



O 



■ E4 ~ 4 -3% 

■ H3 - 3.8% 



■ HI ~ 2 - 5 ' 1 

□ <2.5% 






□<15.2% 



□ H3 



□ hi 



| | H6 
| | <0.6% 



■ HI - 1-3% 
| | H3 - 1.0% 

■ EO - 0.3% 






- 26 .6% 
26. 1% 




□<21.5% 



■ HI - 53.8% 

■ H3 - 37.5% 

■ - 34.8% 

■ HO - 26.8% 

■ E5 - 22. 5% 

■ El - 21.2% 
□ <21.2% 



Figure 12: Pie charts with the percentage of nodes in each department for each cluster obtained by 
using the Clustering Coefficient. 
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Figure 13: Pie charts with the percentage of nodes in each area of knowledge for each cluster obtained 
by using the Clustering Coefficient. 
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Figure [131 shows the distribution of the scientific 
area for each respective pie chart as in Figure [12] 
Most of the cases in the upper branching (i.e. B, 
D, E, I, J, K) have a predominance of the human 
area. Contrariwise, only the cases G, O resulted 
with predominance of the human area in the lower 
branch. The biological area is over-represented in 
the branch C, F, L, M. This is in agreement with 
the above discussion. The exact area is more uni- 
formly distributed among all cases, predominat- 
ing only in cases H and N. 

7 Concluding Remarks 

One of the interesting applications of complex net- 
works has been for the investigation of patterns 
of authorship and collaborations in scientific pro- 
duction (e.g. HI H3[H H7HTHI). The current 
work has extended such investigations by consid- 
ering concentric measurements, which are capa- 
ble of providing additional information about the 
connectivity around each node (e.g.[7, 10]), as well 
as the organization of the results by using pat- 
tern recognition methods (more specifically hier- 
archical clustering). We applied such a methodol- 
ogy to real data related to the scientific collabora- 
tions between authors from the several institutes 
that compose the University of Sao Paulo (USP). A 
number of interesting results have been obtained. 
First, we found that the geographical position of 
institutes tended to produce well-defined groups, 
characterized by more localized clustering coef- 
ficient. This suggests that the collaborations are 
more intense about the institutions in the same 
city. We also found that the three main scien- 
tific areas tended to be differently represented in 
the obtained groups, with the exact sciences tend- 
ing to appear more uniformly in the majority of 
groups. This indicates that this area is character- 
ized by a less uniform pattern of collaborations. 
Future works could address additional measure- 
ments and clustering methods, as well as the study 
of co-authorships with external institutions. 
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